1. What is Bootstrapping? Explain everything in detail about how the bootstrapping can be performed on given dataset. [10 pts]

Collected data from population is limited because it is not practice to collect all population. Bootstrapping is a statistics term, which can help statisticians rebuild the original data(population) by random sampling with replacement. Therefore, the rebuilded data may provides the practice of estimating properties of an estimator. How does bootstrapping work? Resampling the sample. For example, the sample is [1, 2, 3, 4, 5], and resampling the sample to many times(at least 1000 times) to get such as [2, 2, 3, 5, 1] by sampling with replacement. Calculating the properties of those resampled data to inference the population.

https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#Advantages

2. What are advantages and disadvantages of Bootstrapping? [5 Pts]

The advantage of bootstrapping is it is very easy to apply. Also, it can generate more sample when the sample is not enough. The disadvantage of boootstrapping depends on many assumptions. First, the sample needs to be big enough to represent the population. Second, the population is infinite. There are also some additional assumptions, such as linearity, smoothness.

https://influentialpoints.com/Training/bootstrap_confidence_intervals-principles-properties-assumptions.htm#asum

3. Explain the differences between the bootstapping method and cross validation? How both of those methods can be used for Model Selection? [10 Pts]

The bootstapping method takes sample from sample to rebuild population with replacement. The cross validation can split the sample to training set and testing set without replacement. For cross validation, it can use testing set to select model.

https://datascience.stackexchange.com/questions/32264/what-is-the-difference-between-bootstrapping-and-cross-validation

https://zhuanlan.zhihu.com/p/53664662

4. Explain what is Ensemble Learning and Bragging? [5 Pts]

Ensemble Learning combines multiple models to solve a particular computational intelligence problem. Bragging is one of the Ensemble Learning, and each model being chosed in Bragging has same weight. Random forest is one of example of bragging.

https://blog.csdn.net/qq_36330643/article/details/77621232

http://www.scholarpedia.org/article/Ensemble_learning

5. What are different classification techniques available and How does a Randforests work?

[Please write in atleast 5 sentences] [10 pts].

There are many different classification techniques, such as Naive Bayes, Nearest Neighbor, Support Vector Machines, or Decision Trees. Randforests is one of the example of bragging. In random forest, each model is a small decision tree model. Then, the result will be decided by the majority rule.

6. What are some of the advantages and disadvantages of Randomforests? [5 pts]

Advantages of Random Forest: Random forest works for classification and regression. It can process large data with high dimensionality. It allows missing data.

Disadvantages of Random Forest: The regression problem will not be handle as well as the classification problem. If the result is out of the training data's range, it may has overfit issue because the model tries to get higher accuracy.

https://www.quora.com/What-are-the-advantages-and-disadvantages-for-a-random-forest-algorithm

7. What are different subset selection based methods and explain the difference between

subset selection based method's vs cross validation [Explain with example] [10 Pts]

Best-first is to find best subset in each sizes, then comparing each size to get the best result. Greedy selection is to limit the variable and each time only add one variable into models to reduce overhead. Cross validation can be used to check which one has higher accuracy, but cannot reduce the features.

https://en.wikipedia.org/wiki/Feature_selection#Subset_selection https://blog.csdn.net/youngmilk/article/details/70339347

In [ ]:
import pandas as pd
import numpy as np

8. Please Perform Random Forest Prediction on the Give dataset(data.csv) below.[40

Points]. ["valence" is the target variable given in the dataset] Dataset Link : Uploaded as a attachment in Icollege

15 Points for Data Preprocessing. 15 Points for Random Forest along with suitable Plots at the end and Results Expla- nation. 10 Points for applying the crossvalidation on the given data and show all the accuraies for each split. Please do mention what is the best model to use.

Data Preprocessing

In [2]:
df = pd.read_csv("C:\\Users\\Fan_2019\\Downloads\\Data Mining\\HW5\\random_forest_data.csv")
df
Out[2]:
Unnamed: 0 acousticness analysis_url danceability duration_ms energy id instrumentalness key liveness loudness mode speechiness tempo time_signature track_href type uri valence
0 0 0.40000 https://api.spotify.com/v1/audio-analysis/3AEZ... 0.761 222560 0.838 3AEZUABDXNtecAOSC1qTfo 0.000000 4 0.1760 -3.073 0 0.0502 93.974 4 https://api.spotify.com/v1/tracks/3AEZUABDXNte... audio_features spotify:track:3AEZUABDXNtecAOSC1qTfo 0.7100
1 1 0.18700 https://api.spotify.com/v1/audio-analysis/6mIC... 0.852 195840 0.773 6mICuAdrwEjh6Y6lroV2Kg 0.000030 8 0.1590 -2.921 0 0.0776 102.034 4 https://api.spotify.com/v1/tracks/6mICuAdrwEjh... audio_features spotify:track:6mICuAdrwEjh6Y6lroV2Kg 0.9070
2 2 0.05590 https://api.spotify.com/v1/audio-analysis/3QwB... 0.832 209453 0.772 3QwBODjSEzelZyVjxPOHdq 0.000486 10 0.4400 -5.429 1 0.1000 96.016 4 https://api.spotify.com/v1/tracks/3QwBODjSEzel... audio_features spotify:track:3QwBODjSEzelZyVjxPOHdq 0.7040
3 3 0.00431 https://api.spotify.com/v1/audio-analysis/7DM4... 0.663 259196 0.920 7DM4BPaS7uofFul3ywMe46 0.000017 11 0.1010 -4.070 0 0.2260 99.935 4 https://api.spotify.com/v1/tracks/7DM4BPaS7uof... audio_features spotify:track:7DM4BPaS7uofFul3ywMe46 0.5330
4 4 0.55100 https://api.spotify.com/v1/audio-analysis/6rQS... 0.508 205600 0.687 6rQSrBHf7HlZjtcMZ4S4bO 0.000003 0 0.1260 -4.361 1 0.3260 180.044 4 https://api.spotify.com/v1/tracks/6rQSrBHf7HlZ... audio_features spotify:track:6rQSrBHf7HlZjtcMZ4S4bO 0.5550
5 5 0.19800 https://api.spotify.com/v1/audio-analysis/0sXv... 0.736 227707 0.964 0sXvAOmXgjR2QUqLK1MltU 0.000002 0 0.3360 -2.147 1 0.1290 179.935 4 https://api.spotify.com/v1/tracks/0sXvAOmXgjR2... audio_features spotify:track:0sXvAOmXgjR2QUqLK1MltU 0.9530
6 6 0.16700 https://api.spotify.com/v1/audio-analysis/6stY... 0.761 252003 0.829 6stYbAJgTszHAHZMPxWWCY 0.000000 0 0.1890 -3.203 0 0.0681 92.033 4 https://api.spotify.com/v1/tracks/6stYbAJgTszH... audio_features spotify:track:6stYbAJgTszHAHZMPxWWCY 0.8130
7 7 0.02440 https://api.spotify.com/v1/audio-analysis/5mey... 0.680 247493 0.954 5mey7CLLuFToM2P68Qu1gF 0.000000 9 0.1120 -1.823 1 0.1190 104.029 4 https://api.spotify.com/v1/tracks/5mey7CLLuFTo... audio_features spotify:track:5mey7CLLuFToM2P68Qu1gF 0.5210
8 8 0.14200 https://api.spotify.com/v1/audio-analysis/5J1c... 0.776 228467 0.669 5J1c3M4EldCfNxXwrwt8mT 0.000000 11 0.2190 -4.933 1 0.0638 91.012 4 https://api.spotify.com/v1/tracks/5J1c3M4EldCf... audio_features spotify:track:5J1c3M4EldCfNxXwrwt8mT 0.6610
9 9 0.07600 https://api.spotify.com/v1/audio-analysis/58IL... 0.899 234320 0.626 58IL315gMSTD37DOZPJ2hf 0.000000 6 0.0631 -4.228 0 0.2920 88.007 4 https://api.spotify.com/v1/tracks/58IL315gMSTD... audio_features spotify:track:58IL315gMSTD37DOZPJ2hf 0.8730
10 10 0.25600 https://api.spotify.com/v1/audio-analysis/3dQD... 0.772 238800 0.909 3dQDid3IUNhZy1OehIfYfE 0.000000 6 0.2550 -3.225 0 0.1660 96.031 4 https://api.spotify.com/v1/tracks/3dQDid3IUNhZ... audio_features spotify:track:3dQDid3IUNhZy1OehIfYfE 0.6940
11 11 0.09980 https://api.spotify.com/v1/audio-analysis/20ZA... 0.721 226400 0.687 20ZAJdsKB5IGbGj4ilRt2o 0.000000 1 0.0679 -6.682 1 0.0782 175.914 4 https://api.spotify.com/v1/tracks/20ZAJdsKB5IG... audio_features spotify:track:20ZAJdsKB5IGbGj4ilRt2o 0.8250
12 12 0.07840 https://api.spotify.com/v1/audio-analysis/4pdP... 0.476 205947 0.718 4pdPtRcBmOSQDlJ3Fk945m 0.000010 8 0.1220 -5.309 1 0.0576 199.864 4 https://api.spotify.com/v1/tracks/4pdPtRcBmOSQ... audio_features spotify:track:4pdPtRcBmOSQDlJ3Fk945m 0.1420
13 13 0.07860 https://api.spotify.com/v1/audio-analysis/6YZd... 0.724 200813 0.904 6YZdkObH88npeKrrkb8Ggf 0.000000 8 0.2260 -3.354 0 0.0966 90.999 4 https://api.spotify.com/v1/tracks/6YZdkObH88np... audio_features spotify:track:6YZdkObH88npeKrrkb8Ggf 0.8460
14 14 0.07240 https://api.spotify.com/v1/audio-analysis/1lxs... 0.827 197840 0.646 1lxswgIpzV6HhENRvkflES 0.000000 1 0.2470 -4.727 0 0.0766 92.057 4 https://api.spotify.com/v1/tracks/1lxswgIpzV6H... audio_features spotify:track:1lxswgIpzV6HhENRvkflES 0.5120
15 15 0.13200 https://api.spotify.com/v1/audio-analysis/6DUd... 0.730 207307 0.701 6DUdDIRgLqCGq1DwkNWQTN 0.000000 5 0.1510 -5.885 0 0.1060 175.950 4 https://api.spotify.com/v1/tracks/6DUdDIRgLqCG... audio_features spotify:track:6DUdDIRgLqCGq1DwkNWQTN 0.7850
16 16 0.00784 https://api.spotify.com/v1/audio-analysis/1xzn... 0.791 173987 0.619 1xznGGDReH1oQq0xzbwXa3 0.004230 1 0.3510 -5.886 1 0.0532 103.989 4 https://api.spotify.com/v1/tracks/1xznGGDReH1o... audio_features spotify:track:1xznGGDReH1oQq0xzbwXa3 0.3710
17 17 0.41400 https://api.spotify.com/v1/audio-analysis/7BKL... 0.748 244960 0.524 7BKLCZ1jbUBVqRi2FVlTVw 0.000000 8 0.1110 -5.599 1 0.0338 95.010 4 https://api.spotify.com/v1/tracks/7BKLCZ1jbUBV... audio_features spotify:track:7BKLCZ1jbUBVqRi2FVlTVw 0.6610
18 18 0.16500 https://api.spotify.com/v1/audio-analysis/5aAx... 0.681 230453 0.594 5aAx2yezTd8zXrkmtKl66Z 0.000003 7 0.1340 -7.028 1 0.2820 186.054 4 https://api.spotify.com/v1/tracks/5aAx2yezTd8z... audio_features spotify:track:5aAx2yezTd8zXrkmtKl66Z 0.5350
19 19 0.17100 https://api.spotify.com/v1/audio-analysis/1pWY... 0.780 227693 0.929 1pWYnQIlqxTh5bxuPmTG4E 0.000000 11 0.0677 -0.739 0 0.0532 95.012 4 https://api.spotify.com/v1/tracks/1pWYnQIlqxTh... audio_features spotify:track:1pWYnQIlqxTh5bxuPmTG4E 0.8370
20 20 0.33800 https://api.spotify.com/v1/audio-analysis/5MFz... 0.783 214480 0.623 5MFzQMkrl1FOOng9tq6R9r 0.000000 7 0.0975 -6.126 1 0.0800 100.048 4 https://api.spotify.com/v1/tracks/5MFzQMkrl1FO... audio_features spotify:track:5MFzQMkrl1FOOng9tq6R9r 0.4470
21 21 0.16600 https://api.spotify.com/v1/audio-analysis/0OMR... 0.616 203160 0.989 0OMRAvrtLWE2TvcXorRiB9 0.000000 9 0.1720 -1.698 0 0.0483 95.036 4 https://api.spotify.com/v1/tracks/0OMRAvrtLWE2... audio_features spotify:track:0OMRAvrtLWE2TvcXorRiB9 0.9020
22 22 0.03400 https://api.spotify.com/v1/audio-analysis/6b8B... 0.818 225983 0.803 6b8Be6ljOzmkOmFslEb23P 0.000000 1 0.1530 -4.282 1 0.0797 106.970 4 https://api.spotify.com/v1/tracks/6b8Be6ljOzmk... audio_features spotify:track:6b8Be6ljOzmkOmFslEb23P 0.6320
23 23 0.54700 https://api.spotify.com/v1/audio-analysis/5hEM... 0.760 210323 0.838 5hEM0JchdVzQ5PwvSfITeX 0.000001 7 0.0664 -3.828 0 0.0529 93.050 4 https://api.spotify.com/v1/tracks/5hEM0JchdVzQ... audio_features spotify:track:5hEM0JchdVzQ5PwvSfITeX 0.7450
24 24 0.09830 https://api.spotify.com/v1/audio-analysis/0JoH... 0.772 278107 0.547 0JoHqmlqE0W0i9prt6kcHR 0.000004 6 0.1070 -6.373 0 0.0541 130.099 4 https://api.spotify.com/v1/tracks/0JoHqmlqE0W0... audio_features spotify:track:0JoHqmlqE0W0i9prt6kcHR 0.0957
25 25 0.09350 https://api.spotify.com/v1/audio-analysis/1vvN... 0.819 219160 0.913 1vvNmPOiUuyCbgWmtc6yfm 0.000037 4 0.1610 -3.059 0 0.0427 119.989 4 https://api.spotify.com/v1/tracks/1vvNmPOiUuyC... audio_features spotify:track:1vvNmPOiUuyCbgWmtc6yfm 0.5360
26 26 0.02720 https://api.spotify.com/v1/audio-analysis/1mSz... 0.599 214867 0.838 1mSzZKMIPZIkn3jFV7v62b 0.000000 0 0.3300 -5.160 1 0.0996 128.854 4 https://api.spotify.com/v1/tracks/1mSzZKMIPZIk... audio_features spotify:track:1mSzZKMIPZIkn3jFV7v62b 0.8470
27 27 0.08660 https://api.spotify.com/v1/audio-analysis/772i... 0.875 196360 0.709 772io14jrMTxGJbijtDccQ 0.000017 6 0.0604 -6.692 0 0.0923 99.948 4 https://api.spotify.com/v1/tracks/772io14jrMTx... audio_features spotify:track:772io14jrMTxGJbijtDccQ 0.7930
28 28 0.06730 https://api.spotify.com/v1/audio-analysis/1AkT... 0.723 224320 0.889 1AkTW13ysu0AJrwuM6UY0I 0.000000 0 0.3630 -4.017 1 0.0939 101.032 4 https://api.spotify.com/v1/tracks/1AkTW13ysu0A... audio_features spotify:track:1AkTW13ysu0AJrwuM6UY0I 0.7010
29 29 0.62100 https://api.spotify.com/v1/audio-analysis/68EM... 0.728 217707 0.563 68EMU2RD1ECNeOeJ5qAXCV 0.000000 1 0.1790 -8.053 0 0.1340 100.017 4 https://api.spotify.com/v1/tracks/68EMU2RD1ECN... audio_features spotify:track:68EMU2RD1ECNeOeJ5qAXCV 0.3520
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
75770 75770 0.57700 https://api.spotify.com/v1/audio-analysis/7lpf... 0.625 240187 0.575 7lpfmAMzW3MlWSnXrwAGky 0.106000 0 0.0994 -10.557 0 0.0740 92.028 4 https://api.spotify.com/v1/tracks/7lpfmAMzW3Ml... audio_features spotify:track:7lpfmAMzW3MlWSnXrwAGky 0.3960
75771 75771 0.01320 https://api.spotify.com/v1/audio-analysis/54vt... 0.550 195973 0.810 54vtz2MeGYtbvOG440hsYQ 0.000000 3 0.6910 -5.019 1 0.0350 123.987 4 https://api.spotify.com/v1/tracks/54vtz2MeGYtb... audio_features spotify:track:54vtz2MeGYtbvOG440hsYQ 0.6360
75772 75772 0.13400 https://api.spotify.com/v1/audio-analysis/0fQI... 0.737 235867 0.635 0fQIB0rVuLNDNlq3B0hij9 0.003730 8 0.1080 -8.555 1 0.0344 117.011 4 https://api.spotify.com/v1/tracks/0fQIB0rVuLND... audio_features spotify:track:0fQIB0rVuLNDNlq3B0hij9 0.7870
75773 75773 0.22900 https://api.spotify.com/v1/audio-analysis/7xyy... 0.854 227399 0.556 7xyyjOyiYVJCT3CmJl7HwW 0.000002 8 0.4100 -7.605 1 0.0778 136.022 4 https://api.spotify.com/v1/tracks/7xyyjOyiYVJC... audio_features spotify:track:7xyyjOyiYVJCT3CmJl7HwW 0.3440
75774 75774 0.90600 https://api.spotify.com/v1/audio-analysis/0ndl... 0.527 262373 0.346 0ndln0GEXxhuwj3sVzFqXO 0.000000 4 0.0845 -7.539 1 0.0318 123.961 4 https://api.spotify.com/v1/tracks/0ndln0GEXxhu... audio_features spotify:track:0ndln0GEXxhuwj3sVzFqXO 0.4010
75775 75775 0.22200 https://api.spotify.com/v1/audio-analysis/64I0... 0.772 180067 0.582 64I0PKLFEKlcvc7fEVUGq0 0.000088 2 0.1080 -5.140 0 0.0615 117.856 4 https://api.spotify.com/v1/tracks/64I0PKLFEKlc... audio_features spotify:track:64I0PKLFEKlcvc7fEVUGq0 0.3190
75776 75776 0.08170 https://api.spotify.com/v1/audio-analysis/2mFV... 0.604 223469 0.898 2mFVt6vCOcR71lw3kY2UZB 0.002310 8 0.1230 -3.438 0 0.0657 144.958 4 https://api.spotify.com/v1/tracks/2mFVt6vCOcR7... audio_features spotify:track:2mFVt6vCOcR71lw3kY2UZB 0.6830
75777 75777 0.18400 https://api.spotify.com/v1/audio-analysis/2yYf... 0.635 211413 0.578 2yYfWku1ndnxenRK318Ihz 0.000000 0 0.0874 -6.738 1 0.1190 150.057 4 https://api.spotify.com/v1/tracks/2yYfWku1ndnx... audio_features spotify:track:2yYfWku1ndnxenRK318Ihz 0.3090
75778 75778 0.00383 https://api.spotify.com/v1/audio-analysis/3Emm... 0.647 198774 0.932 3EmmCZoqpWOTY1g2GBwJoR 0.000002 11 0.0574 -3.515 1 0.0824 114.991 4 https://api.spotify.com/v1/tracks/3EmmCZoqpWOT... audio_features spotify:track:3EmmCZoqpWOTY1g2GBwJoR 0.3740
75779 75779 0.27300 https://api.spotify.com/v1/audio-analysis/43jB... 0.781 193181 0.570 43jBqV3j3Xi1g6wO0bhIMd 0.000000 11 0.1960 -5.874 0 0.1880 107.059 4 https://api.spotify.com/v1/tracks/43jBqV3j3Xi1... audio_features spotify:track:43jBqV3j3Xi1g6wO0bhIMd 0.8580
75780 75780 0.21400 https://api.spotify.com/v1/audio-analysis/1PSB... 0.697 239293 0.691 1PSBzsahR2AKwLJgx8ehBj 0.000000 2 0.1850 -4.757 1 0.1460 137.853 4 https://api.spotify.com/v1/tracks/1PSBzsahR2AK... audio_features spotify:track:1PSBzsahR2AKwLJgx8ehBj 0.3050
75781 75781 0.10400 https://api.spotify.com/v1/audio-analysis/7of1... 0.880 206099 0.605 7of1slAJIWrXxP8ikAHxje 0.004880 5 0.1060 -7.639 0 0.0531 120.017 4 https://api.spotify.com/v1/tracks/7of1slAJIWrX... audio_features spotify:track:7of1slAJIWrXxP8ikAHxje 0.6260
75782 75782 0.51000 https://api.spotify.com/v1/audio-analysis/2D5H... 0.851 264853 0.625 2D5HeB7Arf6IwCQCdoPBqE 0.000000 5 0.1130 -5.358 0 0.3800 134.751 4 https://api.spotify.com/v1/tracks/2D5HeB7Arf6I... audio_features spotify:track:2D5HeB7Arf6IwCQCdoPBqE 0.3820
75783 75783 0.17500 https://api.spotify.com/v1/audio-analysis/2wJ9... 0.879 171350 0.697 2wJ9TGvwOXWU9iIA9SjJch 0.000000 8 0.1270 -5.627 0 0.0520 131.973 4 https://api.spotify.com/v1/tracks/2wJ9TGvwOXWU... audio_features spotify:track:2wJ9TGvwOXWU9iIA9SjJch 0.8840
75784 75784 0.66900 https://api.spotify.com/v1/audio-analysis/1VdZ... 0.642 258373 0.289 1VdZ0vKfR5jneCmWIUAMxK 0.000000 9 0.1800 -9.918 1 0.0367 84.996 4 https://api.spotify.com/v1/tracks/1VdZ0vKfR5jn... audio_features spotify:track:1VdZ0vKfR5jneCmWIUAMxK 0.4070
75785 75785 0.18800 https://api.spotify.com/v1/audio-analysis/4Otg... 0.692 216747 0.675 4Otg8MXqP8UK3QzcMlY1sx 0.000002 9 0.1140 -8.033 0 0.0687 99.988 4 https://api.spotify.com/v1/tracks/4Otg8MXqP8UK... audio_features spotify:track:4Otg8MXqP8UK3QzcMlY1sx 0.3320
75786 75786 0.09160 https://api.spotify.com/v1/audio-analysis/2nEo... 0.818 211613 0.734 2nEoMpAOMd9ssMeuFhPcOh 0.000183 6 0.0960 -5.239 0 0.0561 106.042 4 https://api.spotify.com/v1/tracks/2nEoMpAOMd9s... audio_features spotify:track:2nEoMpAOMd9ssMeuFhPcOh 0.5400
75787 75787 0.75300 https://api.spotify.com/v1/audio-analysis/2hg9... 0.873 171486 0.681 2hg9CrPl2xXw10gVEsdGQu 0.003390 2 0.0754 -5.584 1 0.0454 96.015 4 https://api.spotify.com/v1/tracks/2hg9CrPl2xXw... audio_features spotify:track:2hg9CrPl2xXw10gVEsdGQu 0.8330
75788 75788 0.15600 https://api.spotify.com/v1/audio-analysis/2aFi... 0.774 219147 0.764 2aFiaMXmWsM3Vj72F9ksBl 0.000000 4 0.0383 -5.445 0 0.0518 118.997 4 https://api.spotify.com/v1/tracks/2aFiaMXmWsM3... audio_features spotify:track:2aFiaMXmWsM3Vj72F9ksBl 0.9120
75789 75789 0.61300 https://api.spotify.com/v1/audio-analysis/3jpd... 0.451 199160 0.252 3jpdfoWVvhMDVkVrJmiVOc 0.000682 2 0.1150 -11.626 1 0.0288 80.158 4 https://api.spotify.com/v1/tracks/3jpdfoWVvhMD... audio_features spotify:track:3jpdfoWVvhMDVkVrJmiVOc 0.1150
75790 75790 0.07780 https://api.spotify.com/v1/audio-analysis/1D3O... 0.696 201254 0.743 1D3ODoXHBLpdxolZRHWV1j 0.000000 6 0.0581 -3.838 1 0.0331 122.978 4 https://api.spotify.com/v1/tracks/1D3ODoXHBLpd... audio_features spotify:track:1D3ODoXHBLpdxolZRHWV1j 0.6440
75791 75791 0.03570 https://api.spotify.com/v1/audio-analysis/7Lrt... 0.861 232801 0.622 7LrtJqLsLBoFvf1qUb2yLt 0.000051 8 0.0669 -4.759 0 0.0542 129.984 4 https://api.spotify.com/v1/tracks/7LrtJqLsLBoF... audio_features spotify:track:7LrtJqLsLBoFvf1qUb2yLt 0.7100
75792 75792 0.57900 https://api.spotify.com/v1/audio-analysis/0y1Q... 0.489 213707 0.505 0y1QJc3SJVPKJ1OvFmFqe6 0.000333 10 0.1040 -8.022 0 0.1170 163.255 4 https://api.spotify.com/v1/tracks/0y1QJc3SJVPK... audio_features spotify:track:0y1QJc3SJVPKJ1OvFmFqe6 0.3370
75793 75793 0.28600 https://api.spotify.com/v1/audio-analysis/2qoi... 0.511 250480 0.214 2qoiQ8OyAZHiAWTpq5nR5v 0.002320 9 0.1170 -12.902 0 0.0951 149.928 4 https://api.spotify.com/v1/tracks/2qoiQ8OyAZHi... audio_features spotify:track:2qoiQ8OyAZHiAWTpq5nR5v 0.1580
75794 75794 0.88000 https://api.spotify.com/v1/audio-analysis/1vyr... 0.402 253013 0.539 1vyrrrnb7182SL6iOMor3O 0.006500 0 0.1170 -11.062 1 0.0430 146.526 4 https://api.spotify.com/v1/tracks/1vyrrrnb7182... audio_features spotify:track:1vyrrrnb7182SL6iOMor3O 0.3710
75795 75795 0.20900 https://api.spotify.com/v1/audio-analysis/0CJ3... 0.842 206787 0.777 0CJ31BEjjl1tPIj0CKi9kH 0.000009 6 0.2290 -3.869 0 0.2260 120.081 4 https://api.spotify.com/v1/tracks/0CJ31BEjjl1t... audio_features spotify:track:0CJ31BEjjl1tPIj0CKi9kH 0.7510
75796 75796 0.49700 https://api.spotify.com/v1/audio-analysis/6l83... 0.917 249001 0.739 6l83gaj4INg0DoQAc0A1NL 0.005040 1 0.0956 -6.005 0 0.1300 132.022 4 https://api.spotify.com/v1/tracks/6l83gaj4INg0... audio_features spotify:track:6l83gaj4INg0DoQAc0A1NL 0.8050
75797 75797 0.08800 https://api.spotify.com/v1/audio-analysis/1ZKd... 0.926 198824 0.551 1ZKdjffRB9iwbKrUXVXl05 0.000000 2 0.1080 -6.679 1 0.2700 131.972 4 https://api.spotify.com/v1/tracks/1ZKdjffRB9iw... audio_features spotify:track:1ZKdjffRB9iwbKrUXVXl05 0.5280
75798 75798 0.06410 https://api.spotify.com/v1/audio-analysis/7rqy... 0.596 206394 0.897 7rqyZM53JpYj86avtfmyeg 0.000052 10 0.0628 -2.940 1 0.0462 118.000 4 https://api.spotify.com/v1/tracks/7rqyZM53JpYj... audio_features spotify:track:7rqyZM53JpYj86avtfmyeg 0.5780
75799 75799 0.08520 https://api.spotify.com/v1/audio-analysis/5BAp... 0.771 196653 0.816 5BApvdlSsPBHVr5i0ItN50 0.000000 8 0.1510 -4.028 0 0.1250 120.111 4 https://api.spotify.com/v1/tracks/5BApvdlSsPBH... audio_features spotify:track:5BApvdlSsPBHVr5i0ItN50 0.7280

75800 rows × 19 columns

In [3]:
df.drop(['Unnamed: 0', 'analysis_url', 'id', 'track_href', 'type', 'uri'], 
         inplace=True, axis=1)

df.head()
Out[3]:
acousticness danceability duration_ms energy instrumentalness key liveness loudness mode speechiness tempo time_signature valence
0 0.40000 0.761 222560 0.838 0.000000 4 0.176 -3.073 0 0.0502 93.974 4 0.710
1 0.18700 0.852 195840 0.773 0.000030 8 0.159 -2.921 0 0.0776 102.034 4 0.907
2 0.05590 0.832 209453 0.772 0.000486 10 0.440 -5.429 1 0.1000 96.016 4 0.704
3 0.00431 0.663 259196 0.920 0.000017 11 0.101 -4.070 0 0.2260 99.935 4 0.533
4 0.55100 0.508 205600 0.687 0.000003 0 0.126 -4.361 1 0.3260 180.044 4 0.555

Random Forest

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
In [5]:
features = ["acousticness", "danceability", "duration_ms", "energy", "instrumentalness", "key", "liveness", "loudness", "mode", "speechiness", "tempo", "time_signature"]
target = ["valence"]
X = df[features]
y = df.valence
In [6]:
# #convert target data type to be int, otherwise the sklearn will give the error of unknown label.
# y = np.array(y, dtype=int)
In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
In [8]:
forest = RandomForestRegressor( n_estimators=10, random_state=42)
forest.fit(X_train, y_train)

#The criterion is MSE, instead of gini, but basically both of them have similiar meaning, which mean how much chance to seperate the data to wrong group.
Out[8]:
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
           oob_score=False, random_state=42, verbose=0, warm_start=False)
In [29]:
y_pred = forest.predict(X_test)
print(y_pred)

for i in range(1,11):
    model = RandomForestRegressor(n_estimators=i)
    model.fit(X_train, y_train)
    print("Model accuracy for each tree",i," is : ",model.score(X_test, y_test))
[0.717 0.521 0.71  ... 0.797 0.594 0.737]
Model accuracy for each tree 1  is :  0.9926936708639009
Model accuracy for each tree 2  is :  0.9929400594751562
Model accuracy for each tree 3  is :  0.9951752316923052
Model accuracy for each tree 4  is :  0.9949888535283877
Model accuracy for each tree 5  is :  0.9957478567099375
Model accuracy for each tree 6  is :  0.9955444070754164
Model accuracy for each tree 7  is :  0.9953176506595106
Model accuracy for each tree 8  is :  0.995494440131859
Model accuracy for each tree 9  is :  0.9953870802544358
Model accuracy for each tree 10  is :  0.9956540623559589
In [26]:
estimator = forest.estimators_[5]
In [27]:
from sklearn.tree import export_graphviz

export_graphviz(estimator, 
                out_file='tree.dot', 
                feature_names = features,
                class_names = target,
                rounded = True, proportion = False, 
                precision = 2, filled = True)

from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
from IPython.display import Image
Image(filename = 'tree.png')

#https://zhuanlan.zhihu.com/p/47720031
#https://zhuanlan.zhihu.com/p/51165358

#The criterion is MSE, instead of gini, but basically both of them have similiar meaning, 
#which mean how much chance to seperate the data to wrong group. In our case, the MSE keep deceasing until 0.
#The sample means how many samples are be seperated to that nodes.
Out[27]: